Vectorized Higher Order Finite Difference Kernels

نویسنده

  • Gerhard W. Zumbusch
چکیده

Several highly optimized implementations of Finite Difference schemes are discussed. The combination of vectorization and an interleaved data layout, spatial and temporal loop tiling algorithms, loop unrolling, and parameter tuning lead to efficient computational kernels in one to three spatial dimensions, truncation errors of order two to twelve, and isotropic and compact anisotropic stencils. The kernels are implemented on and tuned for several processor architectures like recent Intel Sandy Bridge, Ivy Bridge and AMD Bulldozer CPU cores, all with AVX vector instructions as well as Nvidia Kepler and Fermi and AMD Southern and Northern Islands GPU architectures, as well as some older architectures for comparison. The kernels are either based on a cache aware spatial loop or on time-slicing to compute several time steps at once. Furthermore, vector components can either be independent, grouped in short vectors of SSE, AVX or GPU warp size or in larger virtual vectors with explicit synchronization. The optimal choice of the algorithm and its parameters depend both on the Finite Difference stencil and on the processor architecture.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Compact Numerical Implementation for Solving Stokes Equations Using Matrix-vector Operations

In this work, a numerical scheme is implemented to solve Stokes equations based on cell-centered finite difference over staggered grid. In this scheme, all the difference operations have been vectorized thereby eliminating loops. This is particularly important when using programming languages that require interpretations, e.g., MATLAB and Python. Using this scheme, the execution time becomes si...

متن کامل

Finite p-groups with few non-linear irreducible character kernels

Abstract. In this paper, we classify all of the finite p-groups with at most three non linear irreducible character kernels.

متن کامل

Wave Equation Based Stencil Optimizations on Multi-core CPU

As the engine for seismic imaging algorithms, stencil kernels modeling wave propagation are both computeand memoryintensive. This work targets improving the performance of wave equation based stencil code parallelized by OpenMP on a multi-core CPU. To achieve this goal, we explored two techniques: improving vectorization by using hardware SIMD technology, and reducing memory traffic to mitigate...

متن کامل

A New Class of Spatial Covariance Functions Generated by Higher-order Kernels

Covariance functions and variograms play a fundamental role in exploratory analysis and statistical modelling of spatial and spatio-temporal datasets. In this paper, we construct a new class of spatial covariance functions using the Fourier transform of some higher-order kernels. Moreover, we extend this class of spatial covariance functions to the spatio-temporal setting using the idea used in...

متن کامل

A boundary element/finite difference analysis of subsidence phenomenon due to underground structures

Analysis of the stresses, displacements, and horizontal strains of the ground subsidence due to underground excavation in rocks can be accomplished by means of a hybridized higher order indirect boundary element/finite difference (BE/FD) formulation. A semi-infinite displacement discontinuity field is discretized (numerically) using the cubic displacement discontinuity elements (i.e. each highe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012